I was having a look at my running data in Strava the other day and I happily noticed a good increase in my running volumes.
I wanted to analyse the data a bit further, but had to face the limitations of my free account...which doesn't allow downloads.
The data anlyst in me exploded with joy in front of this opportunity! I have all the workouts data available in my Apple Health account, where I also have my diabetes, nutrition and other bio measurements.
Why should I not build some custom Python code to replicate the Strava reports...and enrich them with all that bounty of biomarkers?. A wonderful opportunity for a curious diabetic to put it all together indeed: the running, the eating, the blood glucose!
In this analysis, I provide an overview of my journey coming back to running after an stress fracture on my ankle, how am I training to become and stay injury free runner for life and how running is improving my life as a type-1 diabetic.
I do so by downloading, assessing and processing the Apple Health data exported from my Apple account via the app 'HealthExport'.
I then use the Plotly library to display several visualisations, including bar charts, line charts and treemaps.
Enough talking, let's dive!
# importing
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
color_pal = sns.color_palette()
plt.style.use('fivethirtyeight')
# Define the path of the input and output folders
input_folder = './running_data'
output_folder = './running_data_transformed'
print('PREPARING APPLE HEALTH CSV EXPORTS')
print(' ')
# Get a list of all CSV files in the input folder
csv_files = [f for f in os.listdir(input_folder) if f.endswith('.csv')]
# Loop through each CSV file
for file in csv_files:
# Construct the full path of the input file
input_file = os.path.join(input_folder, file)
# Construct the full path of the output file
output_file = os.path.join(output_folder, file)
# Check if the output file already exists, and skip if it does
if os.path.exists(output_file):
print(f'{output_file} already exists, skipping...')
continue
# Read in the CSV file as a dataframe, with the first column as the index
df = pd.read_csv(os.path.join(input_folder, file))
df.rename(columns = lambda x: x.strip().lower().replace(" ", "_"), inplace=True)
# Apply transformations
# Remove hh-mm-ss
# Apply transformations
# Remove hh-mm-ss
df['date'] = df['date'].apply(lambda x: x.split()[0])
# Convert to datetime
df['date'] = pd.to_datetime(df['date'])
# Set as index
df = df.set_index('date') # Set the 'date' column as the index
# Save the transformed dataframe as a new csv file
new_file_name = os.path.splitext(file)[0] + '_transformed.csv' # Create a new file name
# Save the dataframe as a CSV file in the output folder
df.to_csv(os.path.join(output_folder, new_file_name), index=True)
print(f'{input_file} processed and saved to {output_file}.')
print(' ')
print('LOADING PROCESSED APPLE HEALTH CSV EXPORTS AS DATAFRAMES')
print(' ')
# Get a list of all CSV files in the folder
transformed_csv_files = [f for f in os.listdir(output_folder) if f.endswith('.csv')]
# Loop through each transformed CSV file
for file in transformed_csv_files:
# Construct the full path of the CSV file
file_path = os.path.join(output_folder, file)
# Read the CSV file into a dataframe
df_name = os.path.splitext(file)[0].split('_')[0] # use the filename as the dataframe name
globals()[df_name] = pd.read_csv(file_path)
# setting date column as index in all dataframes
globals()[df_name]['date'] = pd.to_datetime(globals()[df_name]['date'])
globals()[df_name].set_index(['date'], inplace=True)
print(f'{file_path} loaded as {df_name}.')
PREPARING APPLE HEALTH CSV EXPORTS ./running_data/workouts.csv processed and saved to ./running_data_transformed/workouts.csv. ./running_data/insulin.csv processed and saved to ./running_data_transformed/insulin.csv. ./running_data/energy.csv processed and saved to ./running_data_transformed/energy.csv. ./running_data/glucose2.csv processed and saved to ./running_data_transformed/glucose2.csv. ./running_data/glucose.csv processed and saved to ./running_data_transformed/glucose.csv. LOADING PROCESSED APPLE HEALTH CSV EXPORTS AS DATAFRAMES ./running_data_transformed/insulin_transformed.csv loaded as insulin. ./running_data_transformed/workouts_transformed.csv loaded as workouts. ./running_data_transformed/glucose_transformed.csv loaded as glucose. ./running_data_transformed/energy_transformed.csv loaded as energy. ./running_data_transformed/glucose2_transformed.csv loaded as glucose2.
insulin.tail()
| insulin_delivery(iu) | purpose | |
|---|---|---|
| date | ||
| 2023-03-04 | 35.0 | Bolus |
| 2023-03-04 | 10.0 | Basal |
| 2023-03-05 | 23.0 | Bolus |
| 2023-03-05 | 10.0 | Basal |
| 2023-03-06 | 3.0 | Bolus |
workouts.tail()
| active_energy_burned(kcal) | activity | distance(km) | duration(s) | elevation:_ascended(m) | elevation:_maximum(m) | elevation:_minimum(m) | heart_rate_zone:_a_easy_(<115bpm)(%) | heart_rate_zone:_b_fat_burn_(115-135bpm)(%) | heart_rate_zone:_c_moderate_training_(135-155bpm)(%) | heart_rate_zone:_d_hard_training_(155-175bpm)(%) | heart_rate_zone:_e_extreme_training_(>175bpm)(%) | heart_rate:_average(count/min) | heart_rate:_maximum(count/min) | mets_average(kcal/hr·kg) | weather:_humidity(%) | weather:_temperature(degc) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||||||||||
| 2023-03-21 | 59.407 | Yoga | NaN | 748.569 | NaN | NaN | NaN | 0.937 | 0.063 | 0.000 | 0.0 | 0.0 | 97.928 | 125.0 | 4.444 | 77.0 | 12.77 |
| 2023-03-22 | 51.217 | Walking | 0.780 | 586.305 | 25.27 | 72.211 | 45.940 | 1.000 | 0.000 | 0.000 | 0.0 | 0.0 | 84.696 | 97.0 | 5.921 | 82.0 | 8.66 |
| 2023-03-22 | 295.996 | Traditional Strength Training | NaN | 4177.796 | NaN | NaN | NaN | 0.873 | 0.073 | 0.054 | 0.0 | 0.0 | 94.049 | 151.0 | 2.260 | 82.0 | 8.58 |
| 2023-03-22 | 114.325 | Walking | 2.316 | 1666.656 | 25.04 | 71.709 | 45.807 | 0.998 | 0.002 | 0.000 | 0.0 | 0.0 | 74.951 | 116.0 | 3.622 | 81.0 | 9.11 |
| 2023-03-22 | 64.154 | Walking | 1.047 | 907.100 | 17.43 | 60.478 | 49.434 | 0.983 | 0.017 | 0.000 | 0.0 | 0.0 | 84.243 | 118.0 | 4.949 | 78.0 | 12.18 |
energy.tail()
| active_energy_burned(kcal) | basal_energy_burned(kcal) | carbohydrates(g) | fat_saturated(g) | fat_total(g) | protein(g) | step_count(count) | |
|---|---|---|---|---|---|---|---|
| date | |||||||
| 2023-03-18 | 1039.153 | 1544.700 | 578.0 | 9.267 | 38.0 | 106.0 | 7204.112 |
| 2023-03-19 | 1109.846 | 1541.463 | 551.0 | 8.101 | 35.0 | 101.0 | 14835.518 |
| 2023-03-20 | 973.931 | 1541.861 | 504.0 | 5.421 | 29.0 | 121.0 | 12956.085 |
| 2023-03-21 | 1140.518 | 1541.979 | 502.0 | 7.199 | 39.0 | 104.0 | 14477.000 |
| 2023-03-22 | 609.878 | 1013.088 | 433.0 | 8.930 | 32.0 | 108.0 | 7680.007 |
glucose.tail()
| blood_glucose(mg/dl) | insulin_delivery(iu) | |
|---|---|---|
| date | ||
| 2023-03-22 | 124.0 | NaN |
| 2023-03-22 | 123.0 | NaN |
| 2023-03-22 | 125.0 | NaN |
| 2023-03-22 | 125.0 | NaN |
| 2023-03-22 | 125.0 | NaN |
The bulk of cleaning of the dataframes was done while loading them. Now I will just fine-tune some details specific to the datapoints I will be soon using.
My CGM collects blood glucose data every 5 minutes, meaning that there will be around 288 measurements per day.
# Aggregate data for workouts in same day
glucose = glucose.groupby('date').agg({'blood_glucose(mg/dl)': 'mean',
'insulin_delivery(iu)': 'sum',
})
glucose.tail()
# insulin data is 0 in March due to device issues
| blood_glucose(mg/dl) | insulin_delivery(iu) | |
|---|---|---|
| date | ||
| 2023-03-18 | 138.562500 | 0.0 |
| 2023-03-19 | 140.038194 | 0.0 |
| 2023-03-20 | 157.086806 | 0.0 |
| 2023-03-21 | 135.003472 | 0.0 |
| 2023-03-22 | 152.090323 | 0.0 |
The Workout duration is expressed in seconds. I will convert it to minutes8
# from seconds to minutes
workouts['duration(s)']=workouts['duration(s)'].apply(lambda x:x/60)
workouts.rename({'duration(s)':'duration(m)'}, axis=1, inplace=True)
# Look for the 'duration(m)' column
workouts.tail()
| active_energy_burned(kcal) | activity | distance(km) | duration(m) | elevation:_ascended(m) | elevation:_maximum(m) | elevation:_minimum(m) | heart_rate_zone:_a_easy_(<115bpm)(%) | heart_rate_zone:_b_fat_burn_(115-135bpm)(%) | heart_rate_zone:_c_moderate_training_(135-155bpm)(%) | heart_rate_zone:_d_hard_training_(155-175bpm)(%) | heart_rate_zone:_e_extreme_training_(>175bpm)(%) | heart_rate:_average(count/min) | heart_rate:_maximum(count/min) | mets_average(kcal/hr·kg) | weather:_humidity(%) | weather:_temperature(degc) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||||||||||
| 2023-03-21 | 59.407 | Yoga | NaN | 12.476150 | NaN | NaN | NaN | 0.937 | 0.063 | 0.000 | 0.0 | 0.0 | 97.928 | 125.0 | 4.444 | 77.0 | 12.77 |
| 2023-03-22 | 51.217 | Walking | 0.780 | 9.771750 | 25.27 | 72.211 | 45.940 | 1.000 | 0.000 | 0.000 | 0.0 | 0.0 | 84.696 | 97.0 | 5.921 | 82.0 | 8.66 |
| 2023-03-22 | 295.996 | Traditional Strength Training | NaN | 69.629933 | NaN | NaN | NaN | 0.873 | 0.073 | 0.054 | 0.0 | 0.0 | 94.049 | 151.0 | 2.260 | 82.0 | 8.58 |
| 2023-03-22 | 114.325 | Walking | 2.316 | 27.777600 | 25.04 | 71.709 | 45.807 | 0.998 | 0.002 | 0.000 | 0.0 | 0.0 | 74.951 | 116.0 | 3.622 | 81.0 | 9.11 |
| 2023-03-22 | 64.154 | Walking | 1.047 | 15.118333 | 17.43 | 60.478 | 49.434 | 0.983 | 0.017 | 0.000 | 0.0 | 0.0 | 84.243 | 118.0 | 4.949 | 78.0 | 12.18 |
I will subset the 'workouts' dataframe to only include Running activities, and make a 'running' dataframe out of it. Then:
running = workouts.query("activity=='Running'")
running.tail()
| active_energy_burned(kcal) | activity | distance(km) | duration(m) | elevation:_ascended(m) | elevation:_maximum(m) | elevation:_minimum(m) | heart_rate_zone:_a_easy_(<115bpm)(%) | heart_rate_zone:_b_fat_burn_(115-135bpm)(%) | heart_rate_zone:_c_moderate_training_(135-155bpm)(%) | heart_rate_zone:_d_hard_training_(155-175bpm)(%) | heart_rate_zone:_e_extreme_training_(>175bpm)(%) | heart_rate:_average(count/min) | heart_rate:_maximum(count/min) | mets_average(kcal/hr·kg) | weather:_humidity(%) | weather:_temperature(degc) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | |||||||||||||||||
| 2023-03-10 | 515.472 | Running | 9.104 | 61.980817 | 45.27 | 84.313 | 45.149 | 0.017 | 0.111 | 0.659 | 0.213 | 0.0 | 146.439 | 163.0 | 8.710 | 87.0 | 8.07 |
| 2023-03-14 | 177.655 | Running | 3.085 | 23.411883 | 38.89 | 132.882 | 46.554 | 0.322 | 0.341 | 0.337 | 0.000 | 0.0 | 122.366 | 144.0 | 8.031 | 89.0 | 9.53 |
| 2023-03-14 | 219.222 | Running | 4.182 | 33.298567 | 4.99 | 88.997 | 44.477 | 0.305 | 0.413 | 0.282 | 0.000 | 0.0 | 123.195 | 146.0 | 7.114 | 91.0 | 9.09 |
| 2023-03-17 | 587.974 | Running | 8.818 | 60.425883 | 75.97 | 92.401 | 45.330 | 0.010 | 0.284 | 0.706 | 0.000 | 0.0 | 139.452 | 153.0 | 10.044 | 68.0 | 8.41 |
| 2023-03-21 | 534.337 | Running | 9.070 | 60.309700 | 51.23 | 96.909 | 45.033 | 0.000 | 0.054 | 0.401 | 0.545 | 0.0 | 156.176 | 173.0 | 9.190 | 84.0 | 8.66 |
running.columns
Index(['active_energy_burned(kcal)', 'activity', 'distance(km)', 'duration(m)',
'elevation:_ascended(m)', 'elevation:_maximum(m)',
'elevation:_minimum(m)', 'heart_rate_zone:_a_easy_(<115bpm)(%)',
'heart_rate_zone:_b_fat_burn_(115-135bpm)(%)',
'heart_rate_zone:_c_moderate_training_(135-155bpm)(%)',
'heart_rate_zone:_d_hard_training_(155-175bpm)(%)',
'heart_rate_zone:_e_extreme_training_(>175bpm)(%)',
'heart_rate:_average(count/min)', 'heart_rate:_maximum(count/min)',
'mets_average(kcal/hr·kg)', 'weather:_humidity(%)',
'weather:_temperature(degc)'],
dtype='object')
# renaming target columns
running = running.rename({'active_energy_burned(kcal)':'kcal_burned',
'heart_rate_zone:_a_easy_(<115bpm)(%)':'zone1_(<115bpm)(%)',
'heart_rate_zone:_b_fat_burn_(115-135bpm)(%)':'zone2_(115-135bpm)(%)',
'heart_rate_zone:_c_moderate_training_(135-155bpm)(%)':'zone3_(135-155bpm)(%)',
'heart_rate_zone:_d_hard_training_(155-175bpm)(%)':'zone4_(155-175bpm)(%)',
'heart_rate_zone:_e_extreme_training_(>175bpm)(%)':'zone5_(>175bpm)(%)',
'heart_rate:_average(count/min)':'avg_HR',
'heart_rate:_maximum(count/min)': 'max_HR'},
axis=1,
#inplace=True # commenting it out to avoid setting value on copy of slice od dataframe
)
# removing extra columns
running = running.drop(['elevation:_ascended(m)',
'elevation:_maximum(m)',
'elevation:_minimum(m)',
'mets_average(kcal/hr·kg)',
'weather:_humidity(%)',
'weather:_temperature(degc)'],
axis=1,
#inplace=True # commenting it out to avoid setting value on copy of slice od dataframe
)
Some days, like '2023-03-14', my Apple Watch started two activities during the same session. I will sum them up.**
# Aggregate data for workouts in same day
running_grouped = running.groupby('date').agg({'kcal_burned': 'sum', # sum of calories, distance and duration
'distance(km)': 'sum',
'duration(m)': 'sum',
'zone1_(<115bpm)(%)': 'mean', # average of the other values
'zone2_(115-135bpm)(%)': 'mean',
'zone3_(135-155bpm)(%)': 'mean',
'zone4_(155-175bpm)(%)': 'mean',
'zone5_(>175bpm)(%)': 'mean',
'avg_HR':'mean',
'max_HR':'mean'})
running_grouped.tail()
| kcal_burned | distance(km) | duration(m) | zone1_(<115bpm)(%) | zone2_(115-135bpm)(%) | zone3_(135-155bpm)(%) | zone4_(155-175bpm)(%) | zone5_(>175bpm)(%) | avg_HR | max_HR | |
|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||
| 2023-03-03 | 604.088 | 8.329 | 60.453483 | 0.0160 | 0.270 | 0.7140 | 0.000 | 0.0 | 137.3540 | 152.0 |
| 2023-03-10 | 515.472 | 9.104 | 61.980817 | 0.0170 | 0.111 | 0.6590 | 0.213 | 0.0 | 146.4390 | 163.0 |
| 2023-03-14 | 396.877 | 7.267 | 56.710450 | 0.3135 | 0.377 | 0.3095 | 0.000 | 0.0 | 122.7805 | 145.0 |
| 2023-03-17 | 587.974 | 8.818 | 60.425883 | 0.0100 | 0.284 | 0.7060 | 0.000 | 0.0 | 139.4520 | 153.0 |
| 2023-03-21 | 534.337 | 9.070 | 60.309700 | 0.0000 | 0.054 | 0.4010 | 0.545 | 0.0 | 156.1760 | 173.0 |
# saving the processed dataframe to the folder
running_grouped.to_csv(os.path.join(output_folder, 'running_transformed.csv'), index=True)
We're ready for some dashboarding! I will divide the analysis into the following chapters
I will display three simple bar-charts to show the evolution of
of my workouts. These will also include a line showing a 14-day rolling average of the metric, which is helpful to put the progression into a longer-term context.
Since I am in a recovery/back-from-injury period, my runs need to be at a conversational pace (Hearth rate zones 1 and 2).
Running, in this phase, will still be low in volumes relatively to walking. I am using 'walking' as my true endurance builder at the moment: the perfect activity to spend a lot of time on the legs, allowing the joints to adapt and to stay in motion, without taxing the healing joints.
I will use treemaps to show the portion of time spent in the various hearth rate zones, as well as the realtive portion of running to walking.
I will combine my run data with my my DexcomOne (Continuous Glucose Monitor) data into a line chart to see how the two are evolving.
I will create graphs in an interactive dashboard using the Plotly library.
run = pd.read_csv(
os.path.join(output_folder, 'running_transformed.csv'),
parse_dates = ['date'],
#infer_datetime_format=True,
index_col = ['date']
)
run.head()
| kcal_burned | distance(km) | duration(m) | zone1_(<115bpm)(%) | zone2_(115-135bpm)(%) | zone3_(135-155bpm)(%) | zone4_(155-175bpm)(%) | zone5_(>175bpm)(%) | avg_HR | max_HR | |
|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||
| 2022-11-04 | 36.571 | 0.668 | 4.154350 | 0.041 | 0.184 | 0.775 | 0.0 | 0.0 | 139.163 | 150.0 |
| 2022-11-19 | 35.645 | 0.669 | 4.632467 | 0.500 | 0.500 | 0.000 | 0.0 | 0.0 | 120.250 | 133.0 |
| 2022-11-22 | 40.238 | 0.669 | 4.396233 | 0.000 | 0.510 | 0.490 | 0.0 | 0.0 | 136.157 | 146.0 |
| 2022-12-05 | 38.641 | 0.657 | 4.295183 | 0.020 | 0.451 | 0.529 | 0.0 | 0.0 | 135.353 | 143.0 |
| 2022-12-06 | 37.179 | 0.641 | 3.996717 | 0.044 | 0.222 | 0.734 | 0.0 | 0.0 | 136.867 | 147.0 |
I started to run again after my injury in the second half of december 2022.
As part of my 'comeback' from recovery, during the month of december I ran twice a week.
"Running" is probably not the right word: each session lasted around 30-40 minutes, covering a distance of approximately 5 km, and always alternating walking and running intervals (example: 3 minutes running, 2 minutes walking, repeated for 8-10 times).
This bland approach allowed me to put some kilometers in my legs without stressing my healing ankle too much.
From the 21 January 2023, I started the Running School's 12 weeks program (in the blue aread), ramping up my workout schedule to two 1-hour sessions a week.
The increase in distance was rapid but still part of a gentle progression.
Each running workout provided by the coaches was meant to curate different aspects of technique, such as cadence, rythm, balance and posture.
Walk-run intervals were still the core of the training, especially during the first three weeks, until the end of february.
#import libraries
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
# Define list of metrics to display in graphs
metrics = [
'distance(km)',
'duration(m)',
'kcal_burned'
]
# creating a dataframe for weekly data
df2 = run[metrics].copy()
df2.index = pd.to_datetime(df2.index)
weekly_df = (
df2
.resample('W') # Referring to the Week
.sum() # Summing weekly data
.round().astype(int)) # Making it more reader-friendly
# function that will generate a Plotly chart based on the user's selection
def generate_line_chart(metric):
"""
This function takes a metric from the run dataset.
It generates a Plotly bar chart with the date on the x-axis and the metric on the y-axis.
"""
df=run.copy()
fig = px.line(weekly_df,
x=weekly_df.index,
y=weekly_df[metric].rolling(14, 7).mean(), # line with 14-day moving average
color_discrete_sequence=['#4169E1'],
#title="7 day moving average"
)
fig.add_trace(go.Bar(x=weekly_df.index,
y=weekly_df[metric],
name=metric),
)
# Annotating the peak
peak_index = weekly_df[metric].idxmax()
fig.add_annotation(x=peak_index,
y=weekly_df.loc[peak_index, metric],
text="Peak",
showarrow=True,
arrowhead=1)
# Applying new theme and adding a title
fig.update_layout(template='ggplot2',
title=f"Total weekly {metric.capitalize()} - Running",
xaxis_title="Date",
yaxis_title=metric,
showlegend=True,
shapes=[ # Adding shape to highlight the Running School period
dict(
type='rect',
xref='x',
yref='paper',
x0='2023-01-21',
y0=0,
x1='2023-03-30',
y1=1,
fillcolor='blue',
opacity=0.2,
layer='below',
line_width=0
)
])
fig.show()
for metric in metrics:
generate_line_chart(metric)
The increasing intensity of my workouts during the Running School (Blue area) goes hand in hand with the calories burned in each session and with the duration of workouts.
Two 1-hour workouts a week since the end of january makes it around 2 hours of running each week. Before that day, my 'runs' lasted 30-45 minutes, at a very easy pace.
Essentially, since I started the Running School, on a weekly basis I went:
I have also been very cautious with the rythm of my runs, ensuring that I don't push too much and impair my body's recovery.
Monitoring my Hearth Rate Zones is crucial for this, and I try to spend roughly 80% of my workouts between Zone2 (115-135bpm), the infamous 'Conversational Pace', and Zone2 (115-155bpm)!
Let's have a look:
run.columns
Index(['kcal_burned', 'distance(km)', 'duration(m)', 'zone1_(<115bpm)(%)',
'zone2_(115-135bpm)(%)', 'zone3_(135-155bpm)(%)',
'zone4_(155-175bpm)(%)', 'zone5_(>175bpm)(%)', 'avg_HR', 'max_HR'],
dtype='object')
# Subsetting dataframe
hrzones = run[['zone1_(<115bpm)(%)',
'zone2_(115-135bpm)(%)', 'zone3_(135-155bpm)(%)',
'zone4_(155-175bpm)(%)', 'zone5_(>175bpm)(%)']].loc[run.index>'2023-01-01'].mean()
avg_hr = pd.DataFrame(hrzones.reset_index())
avg_hr.columns = ['HR_Zone','avg_time_in_zone']
avg_hr['avg_time_in_zone(%)'] = round((avg_hr['avg_time_in_zone']*100).astype(int))
#avg_hr
# Subsetting dataframe
hrzones = run[['zone1_(<115bpm)(%)',
'zone2_(115-135bpm)(%)', 'zone3_(135-155bpm)(%)',
'zone4_(155-175bpm)(%)', 'zone5_(>175bpm)(%)']].loc[run.index>'2023-01-01'].mean()
avg_hr = pd.DataFrame(hrzones.reset_index())
avg_hr.columns = ['HR_Zone','avg_time_in_zone']
avg_hr['avg_time_in_zone(%)'] = round((avg_hr['avg_time_in_zone']*100).astype(int))
avg_hr = avg_hr.loc[~(avg_hr['avg_time_in_zone(%)']==0)]
#avg_hr
Since I spent no time in zone5, I will drop that row to avoid issues when displaying the treemap.
avg_hr = avg_hr.loc[~(avg_hr['avg_time_in_zone(%)']==0)]
avg_hr
| HR_Zone | avg_time_in_zone | avg_time_in_zone(%) | |
|---|---|---|---|
| 0 | zone1_(<115bpm)(%) | 0.227636 | 22 |
| 1 | zone2_(115-135bpm)(%) | 0.415977 | 41 |
| 2 | zone3_(135-155bpm)(%) | 0.316477 | 31 |
| 3 | zone4_(155-175bpm)(%) | 0.039909 | 3 |
#import libraries
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
fig = px.treemap(avg_hr,
path=['HR_Zone'],
values='avg_time_in_zone(%)',
color='avg_time_in_zone(%)',
color_continuous_scale='Blues'
)
fig.update_layout(template='ggplot2',
title="Time in HR zones (%)",
#treemapcolorway = ["blue"],
)
fig.update_traces(#root_color="green",
marker=dict(cornerradius=5),
labels = ["Zone 1","Zone 2", "Zone 3", "Zone 4", "Zone 5"],
values = list(avg_hr['avg_time_in_zone(%)']),
textinfo = "label+value"
)
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
And indeed, I am spending most of my running time in Zone 2 (41%) and Zone 1 (22%), which combined make a good 63% of running done at a conversational pace...perfect for a recovery!
Based on the new colun 'is_RS': 'N' means it was not a RS day, 'Y' means it was!
(I could have done this earlier, before creating the bar charts. I kept the function as it was: one problem, two approaches to solve it!)
import numpy as np
run['is_RS'] = np.where(run.index<'2023-01-21', 'N', 'Y')
# Subsetting dataframe
hrzones_RS = run[['zone1_(<115bpm)(%)',
'zone2_(115-135bpm)(%)', 'zone3_(135-155bpm)(%)',
'zone4_(155-175bpm)(%)', 'zone5_(>175bpm)(%)', 'is_RS']].loc[run['is_RS']=='Y'].mean()
/var/folders/0l/7mflm_6s5rb9r18y6bzvc_100000gn/T/ipykernel_42838/2654370482.py:7: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
import plotly.graph_objs as go
fig = go.Figure()
fig.add_trace(go.Scatter(
x=run.loc[run['is_RS']=='N'].index,
y=run.loc[run['is_RS']=='N']['avg_HR'],
mode='lines',
name="Before Running School")
)
fig.add_trace(go.Scatter(
x=run.loc[run['is_RS']=='Y'].index,
y=run.loc[run['is_RS']=='Y']['avg_HR'],
mode='lines',
name="During Running School")
)
# Annotating the peak
peak_index = run['avg_HR'].idxmax()
fig.add_annotation(x=peak_index,
y=run.loc[peak_index, 'avg_HR'],
text="Peak",
showarrow=True,
arrowhead=1)
# Before and After RS
fig.add_shape(type='line', x0='2023-01-21', y0=0, x1='2023-01-21', y1=max(run['max_HR'])-5,
line=dict(color='black', width=3, dash='dash'))
# Add an annotation on the vertical line
fig.add_annotation(x='2023-01-21', y=max((run['max_HR'])),
text='Start of Running School',
showarrow=True, arrowhead=1
)
# Applying new theme and adding a title
fig.update_layout(template='ggplot2',
title=f"Average Heart Rate",
xaxis_title="Date",
yaxis_title='avg_HR'
)
fig.show()
Spending time on the legs, bradly speaking, is the best way to recover them and keep them trained in a way that is safe and not conducive to injury.
The 'couch to 5K' protocol sounds great, but when the body is not used to being and staying in motion, that could be dangerous. In fact, that is how injuries come from: one never runs nor walks, stays sit the whole day, barely works the joints. Then one goes for a run, suddenly increases the miles on a body not yet fit for the effort, and voilà!, injury!
I learned this the hard way, and that's why I just walk all the time. Grocery shopping, office commutes, chores of any kind...you name it! Unless the distance is prohibitive or have some time constraints, I'll walk my way there.
How much walking are we talking about? Let's have a look!
# Summary of cardio activities over the last three months
cardio_activities = ['Walking', 'Running', 'Running (Indoor)', 'Cycling', 'Cycling (Indoor)']
cardio_df = workouts.loc[workouts["activity"].isin(cardio_activities)][['activity','active_energy_burned(kcal)','duration(m)' ]]
cardio_df = cardio_df.groupby('activity').sum().reset_index().sort_values(by=['duration(m)'], ascending=True)
cardio_df["%_of_total_time"]=(cardio_df['duration(m)'] / cardio_df['duration(m)'].sum())*100
cardio_df = cardio_df.round()
# Combining cycling and running data into one row for each
## Rename activity
cardio_df['activity'] = cardio_df['activity'].replace({'Cycling (Indoor)': 'Cycling',
'Running (Indoor)': 'Running'})
## Groupby and sum activity
cardio_df = cardio_df.groupby(['activity']).sum().reset_index()
cardio_df
| activity | active_energy_burned(kcal) | duration(m) | %_of_total_time | |
|---|---|---|---|---|
| 0 | Cycling | 21878.0 | 2510.0 | 20.0 |
| 1 | Running | 11643.0 | 1473.0 | 12.0 |
| 2 | Walking | 34127.0 | 8554.0 | 68.0 |
fig = px.treemap(cardio_df,
path=['activity'],
values='%_of_total_time',
color='%_of_total_time',
color_continuous_scale='Blues'
)
fig.update_layout(template='ggplot2',
title="Total Time on Legs (by activity type)",
#treemapcolorway = ["blue"],
)
fig.update_traces(root_color="green",
marker=dict(cornerradius=5),
labels = ['Cycling', 'Running', 'Walking'],
values = list(cardio_df['%_of_total_time']),
textinfo = "label+value"
)
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
As you can see, although running workouts have been and are increasing in frequency and distance, I spend close to 70% of my time just walking! That's where endurance is built, by keeping the legs, the joints and the whole body constantly active!
My past injuries (knee pain and stress fracture on the ankles) had one clear cause: one day in 2019 I decided I would run 10km everyday. I almost never ran with that frequency, nor was I used to the distance. Alhtough I could handle the workouts, my body only lasted one month, then I started to tear apart.
You don't become a runner just because one day you decide you run, out of the blue. You become a runner by designing your life in a way that is conducive to maintain a healthy body, and in turn to allow running.
Randomly standing up from the couch and lacing up the shoes may be good to start and find some initial motivation, but running in the long term requires that our body is able to handle the effort.
That is why I walk so much (and do strenght training and joint training!). Stay ready so you don't have to get ready!
Is my daily average blood glucose getting better, as I am increasing the lenght and intensity of my running workouts?
Let's have a closer look, bringing to the graph the data coming from my DexcomOne (Continuous Glucose Monitor).
fig = go.Figure()
# add the blood_glucose line trace to the plot
fig.add_trace(
go.Scatter(x=glucose.index,
y=glucose['blood_glucose(mg/dl)'].rolling(7, 1).mean(), # 7-day moving average
mode='lines',
name='Average blood glucose',
yaxis="y1", # first y-axis
)
)
# add the distance line trace to the plot
fig.add_trace(
go.Scatter(x=run.index,
y=run['distance(km)'].rolling(7, 1).mean(), # 7-day moving average
mode='lines',
name='Total weekly distance',
yaxis="y2", # second y-axis
)
)
# Before and After RS
fig.add_shape(type='line', x0='2023-01-21', y0=0, x1='2023-01-21', y1=max(glucose['blood_glucose(mg/dl)'])-10,
line=dict(color='black', width=3, dash='dash'))
# Annotating the RS
fig.add_annotation(x='2023-01-21', y=max(glucose['blood_glucose(mg/dl)']),
text='Start of Running School',
showarrow=True, arrowhead=1
)
# show both y-axes in their own y-scale
fig.update_layout(
template='ggplot2',
title='Blood glucose vs Running Distance',
xaxis_title='Week',
yaxis=dict(
title='blood_glucose(mg/dl)',
#titlefont=dict(color='blue'),
#tickfont=dict(color='blue')
),
yaxis2=dict(
title='distance(km)',
#titlefont=dict(color='red'),
#tickfont=dict(color='red'),
overlaying='y',
side='right'
)
)
# show the plot
fig.show()
Interesting! As my running increased (shown by the increased in weekly kilometers I ran), the average daily blood glucose decreased (red line). That's the best thing that has emerged from this analysis!
I am writing more about my plant-based nutrition as a type-1 diabetic endurance athlete. If you're curious, you can have a look here.